Transparent Real-Time Monitoring in MPI
نویسندگان
چکیده
MPI has emerged as a popular way to write architecture–independent parallel programs. By modifying an MPI library and associated MPI run–time environment, transparent extraction of timestamped information is possible. The wall–clock time at which specific MPI communication events begin and end can be recorded, collected, and provided to a central scheduler. The infrastructure to create and collect these events has been implemented and tested, and a future architecture that can use this information is described.
منابع مشابه
Hector : User – Transparent Resource Allocation for MPI
– Hector, a complete job scheduling and parallel run–time environment, is intended to present many features both to parallel and sequential jobs, including dynamic load balancing, checkpointing, near–real–time resource awareness, and transparency to the programmer/user. This describes some recent work on user–transparent enhancements to support load balancing, near–real–time resource awareness,...
متن کاملμπ: a scalable and transparent system for simulating MPI programs
μπ is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of μπ are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (r...
متن کاملMPI/FT: Architecture and Taxonomies for Fault-Tolerant, Message-Passing Middleware for Performance-Portable Parallel Computing
MPI has proven effective for parallel applications in situations with neither QoS nor fault handling. Emerging environments motivate fault -tolerant MPI middleware. Environments include space -based, wide -area/web/meta computing, and scalable clusters. MPI/FT , the system described here, trades off sufficient MPI fault coverage against acceptable parallel performance, based on mission requirem...
متن کاملMPI/FTTM: Architecture and Taxonomies for Fault-Tolerant, Message-Passing Middleware for Performance-Portable Parallel Computing
MPI has proven effective for parallel applications in situations with neither QoS nor fault handling. Emerging environments motivate fault-tolerant MPI middleware. Environments include space-based, wide-area/web/meta computing, and scalable clusters. MPI/FT, the system described here, trades off sufficient MPI fault coverage against acceptable parallel performance, based on mission requirements...
متن کاملLightweight monitoring of MPI programs in real time
Current technologies allow efficient data collection by several sensors to determine the overall evaluation of the status of a cluster. However, no previous work of which we are aware analyzes the behavior of the parallel programs themselves in real-time. In this paper, we perform a comparison of different artificial intelligence techniques that can be used to implement a lightweight monitoring...
متن کامل